Towards Using Machine Translation Techniques to Induce Multilingual Lexica of Discourse Markers
نویسندگان
چکیده
Discourse markers are universal linguistic events subject to language variation. Although an extensive literature has already reported language specific traits of these events (e.g. [6,7,4,3,9]), little has been said on their cross-language behavior and, subsequently, on building an inventory of multilingual lexica of discourse markers. Thus, this work describes new methods and approaches for the description, classification, and annotation of discourse markers in the specific domain of the Europarl corpus. The study of discourse markers in the context of translation is crucial due to the idiomatic nature of these structures (e.g. [1,2]). Multilingual lexica together with the functional analysis of such structures are useful tools for the hard task of translating discourse markers into possible equivalents from one language to another.
منابع مشابه
Towards producing bilingual lexica from monolingual corpora
Bilingual lexica are the basis for many cross-lingual natural language processing tasks. Recent works have shown success in learning bilingual dictionary by taking advantages of comparable corpora and a diverse set of signals derived from monolingual corpora. In the present work, we describe an approach to automatically learn bilingual lexica by training a supervised classifier using word embed...
متن کاملA Knowledge-Modeling Approach for Multilingual Regulus Lexica
Development of lexical resources is, along with grammar development, one of the main efforts when building multilingual NLP applications. In this paper, we present a tool-based approach for more efficient manual lexicon development for a spoken language translation system. The approach in particular addresses the common problems of multilingual lexica including the redundancy of encoded informa...
متن کاملFrom Interoperable Annotations towards Interoperable Resources: A Multilingual Approach to the Analysis of Discourse
In the present paper, we analyse variation of discourse phenomena in two typologically different languages, i.e. in German and Czech. The novelty of our approach lies in the nature of the resources we are using. Advantage is taken of existing resources, which are, however, annotated on the basis of two different frameworks. We use an interoperable scheme unifying discourse phenomena in both fra...
متن کاملA Bilingual Discourse Corpus and Its Applications
Existing discourse research only focuses on the monolingual languages and the inconsistency between languages limits the power of the discourse theory in multilingual applications such as machine translation. To address this issue, we design and build a bilingual discource corpus in which we are currently defining and annotating the bilingual elementary discourse units (BEDUs). The BEDUs are th...
متن کاملMultilingual Annotation and Disambiguation of Discourse Connectives for Machine Translation
Many discourse connectives can signal several types of relations between sentences. Their automatic disambiguation, i.e. the labeling of the correct sense of each occurrence, is important for discourse parsing, but could also be helpful to machine translation. We describe new approaches for improving the accuracy of manual annotation of three discourse connectives (two English, one French) by u...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1503.09144 شماره
صفحات -
تاریخ انتشار 2015